Configuring traffic optimization using distributed edge services

ABSTRACT

Some embodiments provide a novel method for configuring managed forwarding elements (MFEs) to handle data messages for multiple logical networks that are implemented in a data center at the MFEs and to provide gateway service processing (e.g., firewall, DNS, etc.). A controller, in some embodiments, identifies logical networks implemented in the datacenter and MFEs available to provide gateway service processing and assigns gateway service processing for each logical network to a particular MFE. The MFEs, in some embodiments, receive data messages from endpoints in the logical networks that are destined for an external network. In some embodiments, the MFEs identify that the data messages require gateway service processing before being sent to the external network. The MFEs, in some embodiments, identify a particular MFE that is assigned to provide the gateway service processing for logical networks associated with the data messages.

BACKGROUND

An edge device in a datacenter may have several functionalities,including applying services such as virtual private network (VPN),network address translation (NAT), edge firewall, etc. for packetsentering or leaving the datacenter. When there is a high volume ofnorth-south traffic (i.e., traffic entering or exiting) generated in thedatacenter, such an edge device can become a bottleneck. As such, thereis a need for solutions that alleviate this bottleneck while still beingable to provide edge services in a datacenter.

BRIEF SUMMARY

Some embodiments provide a novel method for handling data messages forlogical networks that are implemented in a data center by having managedforwarding elements (MFEs) provide gateway service processing (e.g.,firewall, DNS, etc.). In some embodiments, the MFEs receive datamessages, sent from endpoints in the logical networks, that are destinedfor external networks. When an MFE receiving such a data messageidentifies that the data message requires gateway service processingbefore being sent to the external network, the MFE identifies aparticular MFE (either the same MFE or a different MFE in thedatacenter) that is assigned to provide the gateway service processingfor the logical network associated with the data message. If the MFEthat receives the data message is also the MFE assigned to providegateway service processing for the logical network associated with thedata message, then this MFE provides the gateway service processing andforwards the data message to a datacenter router that provides access tothe external network. If a different MFE is assigned to provide gatewayservice processing for the logical network associated with the datamessage, the MFE forwards the data message to that different MFE for thedifferent MFE to provide the gateway service processing and to forwardthe data message to the datacenter router that provides access to theexternal network.

In some embodiments, the MFEs are configured to provide the gatewayservice processing by a network control system (e.g., a networkcontroller and/or network manager, or cluster of network controllersand/or managers). The network control system, in some embodiments,assigns the gateway service processing for different logical networks todifferent MFEs. In some embodiments, logical networks for which certainedge services (e.g., VPN or network address translation (NAT)) isrequired are assigned to edge nodes that provide centralized gatewayservice processing, instead of being assigned to the distributed MFEs.The assignment of logical networks to MFEs, in some embodiments, is aload balancing operation that takes into account the capacity of thedifferent MFEs (and the hosts on which they execute) to handleadditional processing. A single MFE may be assigned multiple logicalnetworks for which that MFE provides gateway service processing. In someembodiments, the network control system configures the MFEs to performthe gateway service processing and to identify the MFE assigned to eachlogical network. The network control system provides processing rules toeach MFE (e.g., firewall rules) for the logical networks assigned to theMFE and policy-based routing entries used to identify the MFE assignedto a particular logical network.

The MFEs of some embodiments execute on the same machines as endpointsof the logical networks. In some embodiments, the MFEs execute invirtualization software (e.g., a hypervisor) of a host computer. In someembodiments, when a different MFE is identified as the MFE assigned toprovide gateway service processing for a logical network, the MFE thatreceived the data message forwards the data message to the different MFEthrough a tunnel. These tunnels may use virtual extensible local areanetwork (VXLAN) encapsulation, Generic Network VirtualizationEncapsulation (GENEVE), or other types of encapsulation. In someembodiments, the logical networks span multiple datacenters (e.g.,customer sites) that are connected so that communication betweendatacenters does not require network address translation (NAT), virtualprivate networks (VPN), or IPsec encapsulation and gateway serviceprocesses can therefore be distributed to MFEs instead of centrallyprovided at a dedicated edge node. This can prevent an edge node frombecoming a bottleneck for north-south traffic.

The preceding Summary is intended to serve as a brief introduction tosome embodiments of the invention. It is not meant to be an introductionor overview of all inventive subject matter disclosed in this document.The Detailed Description that follows and the Drawings that are referredto in the Detailed Description will further describe the embodimentsdescribed in the Summary as well as other embodiments. Accordingly, tounderstand all the embodiments described by this document, a full reviewof the Summary, Detailed Description, the Drawings, and the Claims isneeded. Moreover, the claimed subject matters are not to be limited bythe illustrative details in the Summary, Detailed Description, and theDrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth in the appendedclaims. However, for purposes of explanation, several embodiments of theinvention are set forth in the following figures.

FIG. 1 illustrates an exemplary environment in which the invention isimplemented.

FIG. 2 conceptually illustrates a process that is performed to assigngateway service processing for different logical networks to MFEs in thedatacenter.

FIG. 3 illustrates a set of controllers sending configuration data tohosts in a datacenter.

FIG. 4 conceptually illustrates a process for processing a data message.

FIGS. 5A-B illustrate data messages that receive gateway serviceprocessing at an MFE executing on the same host computer as the sourceof a first data message and destination of a return data message.

FIG. 6A-B illustrate data messages that receive gateway serviceprocessing at an MFE executing on a different host computer than thesource of a first data message and destination of a return data message.

FIG. 7 illustrates a process for processing a data message destined foran external network through a distributed logical router and an edgeservices gateway port.

FIG. 8 conceptually illustrates a computer system with which someembodiments of the invention are implemented.

DETAILED DESCRIPTION

In the following detailed description of the invention, numerousdetails, examples, and embodiments of the invention are set forth anddescribed. However, it will be clear and apparent to one skilled in theart that the invention is not limited to the embodiments set forth andthat the invention may be practiced without some of the specific detailsand examples discussed.

As used in this document, data messages refer to a collection of bits ina particular format sent across a network. Also, as used in thisdocument, a data flow refers to a set of data messages sharing a set ofattributes (e.g. a five-tuple) even if the shared set of attributes hassource and destination values switched for different directions ofcommunication (i.e., from a first machine to a second machine and fromthe second machine back to the first machine). Data flows (or flows) asused in this document, in some instances, refer to one half of acommunication between two machines (i.e., a flow refers, in some cases,to the communication from one machine to another machine in onedirection). One of ordinary skill in the art will recognize that theterm data message may be used herein to refer to various formattedcollections of bits that may be sent across a network, such as Ethernetframes, IP packets, TCP segments, UDP datagrams, etc. Also, as used inthis document, references to L2, L3, L4, and L7 layers (or layer 2,layer 3, layer 4, layer 7) are references, respectively, to the seconddata link layer, the third network layer, the fourth transport layer,and the seventh application layer of the OSI (Open SystemInterconnection) layer model.

Some embodiments provide a novel method for handling data messages forlogical networks (e.g., logical switches, or sets of logical forwardingelements) that are implemented in a data center by having managedforwarding elements (MFEs) provide gateway service processing (e.g.,firewall, DNS, etc.). In some embodiments, the MFEs receive datamessages, sent from endpoints in the logical networks, that are destinedfor external networks. When an MFE receiving such a data messageidentifies that the data message requires gateway service processingbefore being sent to the external network, the MFE identifies aparticular MFE (either the same MFE or a different MFE in thedatacenter) that is assigned to provide the gateway service processingfor the logical network associated with the data message. If the MFEthat receives the data message is also the MFE assigned to providegateway service processing for the logical network associated with thedata message, then this MFE provides the gateway service processing andforwards the data message to a datacenter router that provides access tothe external network. If a different MFE is assigned to provide gatewayservice processing for the logical network associated with the datamessage, the MFE forwards the data message to that different MFE for thedifferent MFE to provide the gateway service processing and to forwardthe data message to the datacenter router that provides access to theexternal network. In some embodiments, a set of the logical networks arelogical switches which are each assigned an MFE to provide gatewayservice processing.

FIG. 1 illustrates an exemplary environment in which the invention ofsome embodiments is implemented. FIG. 1 includes two datacenters 120connected by a direct connection 140. The direct connection 140 is aconnection through an external network 155 over which the edge routers130 provided by the datacenters do not require centralized edge servicessuch as network address translation (NAT) or IPSec encapsulation forinter-datacenter communication. A plurality of host computers 100 and aset of controller computers 115 are included in the datacenters 120 andare connected by internal networks 150. The host computers host endmachines (data compute nodes (DCNs) such as virtual machines 105,containers, namespaces, etc.) and execute virtualization software 110that includes a software forwarding element (or set of softwareforwarding elements) that is referred to herein as a managed forwardingelement (MFE) 112. In different embodiments, the MFE on a particularhost may be one software forwarding element that implements multiplelogical switches and/or logical routers, or multiple separate softwareforwarding elements executing in the virtualization software (e.g., oneor more virtual switches and/or virtual routers). In some embodiments,the virtualization software is a hypervisor on top of which the endmachines execute.

The MFEs 112 are capable of communicating directly with edge router 130to send messages to other datacenters or to the external network 155.Within each datacenter, the controller computers 115 and a set ofnetwork manager computers (not shown) control the host machines 100 toimplement a set of logical networks by configuring the virtualizationsoftware 110 including the MFE 112 to perform logical processing for thelogical networks. In some embodiments, the MFEs are configured toperform first-hop processing on data messages. That is, the first MFEthat receives a data message from an end machine (i.e., the MFEexecuting on the same host computer as the source end machine of thedata message) performs logical processing for all logical forwardingelements (e.g., logical switching for a source logical switch, logicalrouting for a logical router, and logical switching for a destinationlogical switch) along a logical path to a destination machine. As willbe discussed below, in some embodiments this logical processingperformed by the MFEs also includes gateway service processing.

In some embodiments, the MFEs are configured to provide the gatewayservice processing by a network control system (e.g., a networkcontroller and/or network manager, or cluster of network controllersand/or managers). The controllers 115 in FIG. 1 represent such a networkcontrol system; it should be understood that while this figure showscontrollers 115, these could be management plane applications orcomputers, network controller applications or computers, or combinationsthereof. The network control system, in some embodiments, assigns thegateway service processing for different logical networks to differentMFEs. In some embodiments, logical networks for which certain edgeservices (e.g., VPN or network address translation (NAT)) are requiredare assigned to edge nodes that provide centralized gateway serviceprocessing, instead of being assigned to the distributed MFEs.Centralized gateway service processing, in some embodiments, includesany or all of network address translation (NAT) for multiple logicalnetworks sharing a same external IP address, multiple logical networkssharing a same virtual private network (VPN) IP address, or multiplelogical networks using IPsec encapsulation before being sent to aprovider edge router.

The assignment of logical networks to MFEs, in some embodiments, is aload balancing operation that takes into account the capacity of thedifferent MFEs (and the hosts on which they execute) to handleadditional processing. A single MFE may be assigned multiple logicalnetworks for which that MFE provides gateway service processing. In someembodiments, the network control system configures the MFEs to performthe gateway service processing and to identify the MFE assigned to eachlogical network. The network control system provides processing rules toeach MFE (e.g., firewall rules) for the logical networks assigned to theMFE and policy-based routing entries used to identify the MFE assignedto a particular logical network.

FIG. 2 conceptually illustrates a process 200 that is performed toassign gateway service processing for different logical networks to MFEsin the datacenter. In some embodiments, the process 200 is performed bya network control system (e.g., controller computers 115). The process200 begins by identifying (at 210) the logical networks implemented inthe datacenter. In some embodiments, identifying the logical networkscomprises querying a network manager. In addition to identifying theexistence of the logical networks, in some embodiments the controlleridentifies a set of gateway services applied for each identified logicalnetwork. As mentioned above, in some embodiments each logical switch istreated as a separate logical network for the purposes of assigninggateway processing to different MFEs. That is, a single logical routermight have multiple logical switches (with different logical subnets)connected. In some embodiments, each of these logical switches (i.e.,logical subnets) may be assigned to a different MFE for gateway serviceprocessing.

Next, the process 200 identifies (at 220) a set of MFEs that areexecuting in the datacenter. The set of MFEs, in some embodiments, areMFEs that are directly connected to a provider edge router that connectsthe datacenter to external networks. In some embodiments, identifyingthe MFEs includes identifying characteristics of the MFEs that relate tothe capacity of the MFE (or the host on which the MFE executes) toprocess north-south data messages (processing power, current load, MCspeed, etc.). That is, some MFEs may not be able to process thesenorth-south data messages because they execute on host computers thateither do not have the resources to perform edge services or are notconnected to a provider edge router.

After identifying (at 220) the set of MFEs, the process 200 selects (at230) a logical network for which to assign gateway services to one ofthe MFEs in the identified set of MFEs executing in the datacenter (ifthe gateway services are eligible for assignment to such an MFE). Thelogical networks, in some embodiments, are identified by a logicalnetwork identifier (e.g., VLAN or VXLAN ID) or a subnet associated withthe logical network.

For the selected logical network, the process 200 determines (at 240)whether the logical network requires gateway service processing that canbe performed at an MFE. If gateway service processing can be performedat an MFE (i.e., does not need to be performed at an edge node), theprocess 200 assigns (at 250) the gateway service processing for thelogical network to a particular MFE. In some embodiments, the assignmentis based on a load balancing decision. The load balancing decision, insome embodiments, is a hash-based decision that determines an MFE fromamong the identified MFEs to which the gateway service processing forthe logical network is assigned based on a hash of the logical networkidentifier. In some embodiments, the association between a set of hashvalues and particular MFEs is based on resources available to each MFEsuch that MFEs with more resources are associated with more hash values.In addition, some embodiments restrict the possible MFEs that areeligible to perform gateway service processing for a particular logicalnetwork to those MFEs that are already required to implement distributedaspects of that logical network (e.g., logical switches and/ordistributed logical routers). Once an MFE is assigned to provide gatewayservice processing the process 200 proceeds to determine (at 260)whether there are additional logical networks to assign to MFEs.

On the other hand, if the process 200 determines (at 240) that thelogical network requires gateway service processing that cannot beperformed at an MFE, the process 200 assigns (at 255) the gatewayprocessing for the logical network to an edge node that provides gatewayservice processing for logical networks that cannot have gateway serviceprocessing distributed to the MFEs. In certain cases, multiple logicalnetworks need to have all of their north-south traffic processed at thesame ingress/egress point, and thus these networks are assigned to anedge node. For instance, if multiple logical networks share the same NATIP address or use the same VPN IP address, then the gateway serviceprocessing for the flows belonging to these networks are assigned to anedge node rather than being balanced across multiple MFEs in someembodiments. For instance, two or more logical switches that areattached to the same logical router might have different logical subnetsbut share the same VPN IP for external traffic.

After assigning (at either 250 or 255) the gateway service processing toeither an MFE or gateway device, the process 200 proceeds to determine(at 260) whether there are additional logical networks to assign toMFEs. If the process 200 determines (at 260) that there are additionallogical networks to assign to MFEs, the process 200 selects (at 230) anext logical network for which to assign gateway service processing toan MFE in the set of MFEs.

After all logical networks for which gateway service processing can beprovided by the MFEs have been assigned to an MFE, the process 200generates (at 270) configuration data for configuring the MFEs toimplement the assigned gateway service processing and sends theconfiguration data to the host computers on which the MFEs execute. Insome embodiments, separate configuration data is generated for each MFE.The configuration data, in some embodiments, includes a set of policybased routing rules that cause MFEs to forward data messages to the MFEsassigned to process data messages for a logical network associated withthe data messages. In some embodiments, a host (e.g., an MFE on thehost) is configured to implement a distributed router and an edgeservices gateway port that are discussed in more detail in relation toFIG. 7.

In some embodiments, the policy based routing rules specify routes fordata messages for a logical network based on at least one of a logicalnetwork identifier (e.g., VLAN or VXLAN tag), a set of IP addresses(e.g., an IP subnet) associated with the logical network, and a set ofMAC addresses associated with the logical network. In some embodiments,the configuration data includes a set of associations between particularlogical networks and particular MFEs assigned to provide gateway serviceprocessing for the logical networks. The set of associations may use anycombination of logical network identifiers, IP addresses (e.g.,subnets), and MAC addresses. For example, in some embodiments, a set ofpolicy-based routing rules (e.g., src 192.168.1.0/24, dst 0/0→HYP1_VTEP;src 192.168.2.0/24, dst 0/0→HYP2_VTEP) is configured in an edge servicesgateway port on a distributed router to which outgoing data messages areforwarded. In some embodiments, the associations are embedded in a setof routing entries used to configure the MFE.

The configuration data for a particular MFE, in some embodiments, alsoincludes data for implementing the gateway service processing for thelogical networks assigned to the particular MFE. In some embodiments,each MFE receives data (e.g., service rules) for implementing thegateway service processing for all the logical networks. In someembodiments, the gateway service processing includes a logical firewallfor a particular logical network and the configuration data includes aset of firewall rules for the particular logical network. A single MFE,in some embodiments, may be assigned multiple logical networks for whichit provides gateway service processing. In some embodiments, differentgateway services are provided for different logical networks at the sameMFE. For example, a first logical network may require a logical firewallwhile a second logical network requires DNS services and theconfiguration data for each service is provided to the MFE.

Configuration data, in some embodiments, is sent to a controller proxymodule on a host computer that interacts with the network control system(e.g., a set of controller and/or manager computers) to configure othermodules on the host computer to implement the configuration data sentfrom the network control system. In some embodiments, the hostconfigures the MFE to implement the configuration data by creatingentries in a routing table or policy based routing rule set based on theconfiguration data received from the network control system. The hostcomputer also configures the MFEs, in some embodiments, with the gatewayservice processing rules or information for the logical networksassigned to the MFE to provide gateway service processing. In someembodiments, process 200 is performed initially, and operations 220,230, 240, 250 or 255, and 270 are performed on the creation (orimplementation) of a new logical network in the datacenter.Additionally, if an MFE assigned to perform gateway service processingfor a set of logical networks is removed from the network, operations220, 230, and 250-270 are performed to reassign the gateway serviceprocessing for the set of logical networks among the remaining MFEs. Ifa new MFE is added to the network, some embodiments reassign the logicalnetworks to include the new MFE. Other embodiments wait until newlogical networks are created to include the new MFE in the gatewayprocessing assignment.

FIG. 3 illustrates a set of controllers sending configuration data tohosts in a datacenter. FIG. 3 illustrates a set of controllers 315 and aset of host computers 300 in a datacenter 320. The host computers 300execute VMs 305 and virtualization software 310 including MFE 312. Asdiscussed above, the set of controllers generates sets of configurationdata 360 according to process 200. The sets of configuration data 360,in some embodiments, include a separate set of configuration data 360for each of a set of hosts 300 in the datacenter 320.

FIG. 3 illustrates that different sets of configuration data 360A and360B are sent to different hosts to configure the different MFEs toprovide gateway service processing for different logical networks (e.g.,logical networks 1 and 2 (LN1 and LN2) respectively). In someembodiments, the different sets of configuration data 360 also includeconfiguration data that is common to all (or many) of the MFEs that isused to forward data messages for different logical networks. Thiscommon configuration data can include data specifying MFEs assigned toprovide gateway service processing for each logical network, which isprovided to each MFE that performs processing for that logical network.The common configuration data, in some embodiments, are a set offorwarding rules or policy-based rules that are used to identify the MFEassigned to provide gateway service processing for a particular logicalnetwork. In some embodiments, the forwarding configuration data is sentseparately from the specific configuration data (e.g., 360A and 360B)for configuring the MFE to provide the gateway service processing tological networks assigned to the MFE.

The separate sets of configuration data, in some embodiments, include aset of policy based routing rules identifying MFEs/hosts associated witheach logical network. In some embodiments, the policy based routingrules specify at least one of a logical network identifier (e.g., VLANor VXLAN tag), a set of IP addresses (e.g., an IP subnet) associatedwith the logical network, and a set of MAC addresses associated with thelogical network. In general, a policy-based routing rule can be based onany information contained in a header field of a received data message.In some embodiments, the configuration data includes a set ofassociations between particular logical networks and particular MFEsassigned to provide gateway service processing for the logical networks.The set of associations, in some embodiments, use any combination oflogical network identifiers, IP addresses (e.g., subnets), and MACaddresses. For example, in some embodiments, a set of policy-basedrouting rules (e.g., src 192.168.1.0/24, dst 0/0→HYP1_VTEP; src192.168.2.0/24, dst 0/0→HYP2_VTEP) is configured in an edge servicesgateway port on a distributed router to which outgoing data messages areforwarded. In some embodiments, the associations are embedded in a setof routing entries used to configure the MFE.

The configuration data for a particular MFE, in some embodiments, alsoincludes data for implementing the gateway service processing for thelogical networks assigned to the particular MFE. In some embodiments,the gateway service processing includes a set of gateway serviceprocessing (e.g., a logical firewall or domain name service) for aparticular logical network and the configuration data includes a set ofconfiguration data for implementing the set of services for theparticular logical network. A single MFE, in some embodiments, isassigned multiple logical networks for which it provides gateway serviceprocessing. In some embodiments, different gateway services are providedfor different logical networks at the same MFE. For example, a firstlogical network may require a logical firewall while a second logicalnetwork requires DNS services and the configuration data for eachservice is provided to the MFE.

In some embodiments, the configuration data specifies an edge servicesgateway port of a distributed logical router implemented by each MFE (orthe virtualization software). When a first end machine port associatedwith a logical network is instantiated on a host computer in someembodiments, a policy (e.g., a policy based routing rule) is created (orconfigured) on the edge service gateway port that applies to trafficfrom the logical network. For example, a policy might be created thatapplies to a source subnet (e.g., 192.168.1.0/24) that is associatedwith the logical network and for a set of destination addresses (e.g.,src 192.168.1.0/24, dst 0/0→MFE_VTEP) that specifies forwarding the datamessage to, for example, the VXLAN tunnel endpoint (VTEP) IP address(MFE_VTEP) associated with the MFE assigned for the logical network. Onan MFE that provides the gateway service processing for the logicalnetwork, the MFE is configured with a policy-based routing rule in someembodiments (e.g., src 192.168.1.0/24, dst 0/0→apply egress firewall→PR)to provide the gateway service processing and to forward data messageswith a source address in a subnet associated with the logical network toa provider edge router (at an IP address ‘PR’). A route for trafficentering the network at the MFE, in some embodiments, is specified forthe traffic destined for the subnet associated with the logical networkand specifies an action and a destination port (e.g., src 0/0, dst192.168.1.0/24→apply ingress firewall→DLR), such that an incoming datamessage has the ingress firewall rules applied and is then forwarded toa distributed logical router port (DLR) for east-west processing. Insome embodiments, these policy-based rules or routing entries are usedin implementing a distributed logical router and the MFE assigned toprovide the gateway service processing is identified by a VXLAN tunnelendpoint (VTEP) IP address to which the data message should be tunneled.Distributed logical routers are described in more detail in UnitedStates Patent Application No. 2016/0226700, which is hereby incorporatedby reference. By configuring each MFE with routes (or policy-basedrouting rules) for data messages of the different logical networks, anMFE on a host computer to which an end machine migrates does not requirenew configuration specific to the migrated machine.

As mentioned, the MFEs use this configuration data to process datamessages sent to and from the logical network endpoints in thedatacenter. When an MFE receives a data message from a logical networkendpoint, that MFE (i) determines whether the data message requiresgateway processing and, if so, (ii) determines whether to performgateway processing locally or send the data message to another MFE (oredge node) for the gateway processing, depending on where the gatewayservice processing is performed for the particular logical network withwhich the data message is associated.

FIG. 4 conceptually illustrates a process 400 for processing a datamessage. In some embodiments, the process 400 is performed by an MFEexecuting on a host computer. The host computer, in some embodiments,executes a virtualization software in which the MFE executes as well asendpoint machines that are the source of data messages received by theMFE executing on the host computer. As described above, the MFE on aparticular host may be one software forwarding element that implementsmultiple logical switches and/or logical routers, or multiple separatesoftware forwarding elements executing in the virtualization software(e.g., one or more virtual switches and/or virtual routers).

The process 400 begins by receiving (at 410) a data message from anendpoint machine. Some data messages requiring gateway serviceprocessing are destined for a machine in an external network whileothers are destined for a machine in a local network. In some cases,only north-south traffic (i.e., traffic entering or exiting thedatacenter) requires gateway service processing, while in otherembodiments some or all east-west traffic also requires such gatewayservice processing. The external network, in some embodiments, isreached through a provider edge router that connects to the externalnetwork.

After receiving the data message, the process 400 performs (at 420)logical network processing to identify (at 420) the logical networksassociated with the source and destination of the received data message.The identification, in some embodiments, is based on a logical networkidentifier (e.g., a VLAN or VXLAN ID), while in other embodiments, theidentification is based on a source and destination IP address of thedata message (e.g., an IP subnet to which the IP addresses belongs). Inyet other embodiments, a logical network is identified based on a MACaddress or the port of the MFE through which the data message isreceived. In some embodiments, a source IP address is in a IP subnetthat is used in a policy-based routing rule for the logical network (src192.168.2.0/24, dst 0/0→egress port).

The process 400 determines (at 430) whether the data message requiresgateway service processing. In some embodiments, determining that thedata message requires gateway service processing is done implicitlybased on the inclusion of the gateway service processing in a processingpipeline (e.g., a logical processing pipeline) of a logical forwardingelement (e.g., a logical router) associated with the logical network.Determining that the data message requires gateway service processing,in some embodiments, is implicit in identifying (at 440) an MFE to whichgateway service processing for the logical network has been assigned. Insome embodiments, the MFE is identified by a destination address (e.g.,a VTEP IP address associated with the MFE) specified in a policy-basedrouting rule as discussed above. If the process 400 determines (at 430)that the data message does not require gateway service processing (e.g.,there is no gateway service processing associated with a logical networkto which the data message belongs, or the data message is not of a typethat requires gateway service processing), the process performs (at 460)the logical processing for the data message and forwards the datamessage to the destination (e.g., through the provider edge router ifthe destination is external or to another MFE at another host computerin the datacenter at which the destination is located if the destinationis another logical network endpoint in the datacenter), and the processends. In some embodiments, determining that no gateway serviceprocessing is associated with the logical network is implicit in thelack of a policy-based routing rule that applies to the logical network.

If the process 400 determines (at 430) that the data message requiresgateway service processing, the process 400 then identifies (at 440) theMFE assigned to provide gateway service processing to the logicalnetwork identified (at 420). In some embodiments, the identification ofthe MFE is based on a policy-based routing rule that is based on theconfiguration data received from the set of controller computers asdiscussed above. In other embodiments, rather than use policy-basedrouting, some embodiments identify the MFE based on a table or otherdata structure that identifies a correspondence between a logicalnetwork identifier and an MFE that is assigned to perform gatewayservice processing for the logical network. In some embodiments, the MFEidentified (at 440) as assigned to provide gateway service processing isan edge node that provides centralized gateway service processing for aset of logical networks in the datacenter (e.g., a set of logicalswitches that are behind a single VPN or NAT IP).

After identifying (at 440) the MFE assigned to provide gateway serviceprocessing to the logical network identified (at 420), the processdetermines (at 450) whether the MFE performing the process 400 (“thecurrent MFE”) is the MFE assigned to provide gateway service processingto the logical network. If the current MFE is assigned to providegateway service processing, the process performs (at 460) the logicalprocessing for the data message. In some embodiments, the logicalprocessing includes logical L2 and L3 switching and routing operationsas well as the gateway service processing for the logical network. Insome embodiments, the determination of whether the current MFE is theMFE assigned to provide gateway service processing to the logicalnetwork is implicit in identifying the current MFE's address as a nexthop at an edge gateway services port of the current MFE using a routingentry or policy-based routing rule of the edge services gateway port.

On the other hand, if the current MFE is not the MFE assigned to providegateway service processing to the logical network, the process 400forwards (at 470) the data message to the MFE identified as beingassigned to provide gateway service processing to the logical networkidentified (at 420) as being associated with the data message and theprocess ends. In some embodiments, the MFE to which the data message isforwarded provides the gateway service processing for the data messageand forwards the data message to the provider edge router. One ofordinary skill in the art will appreciate that the operations of process400 are performed, in some embodiments, in a slightly different order orthat some operations are combined into a single operation (e.g.,examining a policy-based routing rule may identify an MFE responsiblefor providing gateway service processing, which implicitly determinesthat gateway service processing is required for the data message).

The MFEs of some embodiments execute on the same machines as endpointsof the logical networks, as shown above in FIG. 1. In some embodiments,the MFEs execute in virtualization software (e.g., a hypervisor) of ahost computer. In some embodiments, the logical networks span multipledatacenters (e.g., customer sites) that are connected so thatcommunications between datacenters does not require network addresstranslation (NAT), virtual private networks (VPN), or IPsecencapsulation before being sent to a provider edge router and gatewayservice processes can therefore be distributed to MFEs instead ofcentrally provided at a dedicated edge node.

FIGS. 5A-B illustrate data messages that receive gateway serviceprocessing at an MFE executing on the same host computer as the sourceof a first data message and destination of a return data message. FIGS.5A-B include datacenters 520A and 520B connected by provider edgerouters 530A and 530B through external network 555. Each datacenterincludes multiple hosts 500 (though only one host is shown in datacenter520B) that each execute a set of end machines (e.g., VMs 505) and an MFE512 (which may execute within virtualization software of the host).

FIG. 5A illustrates a first data message being sent from VM 505A in afirst datacenter to VM 505B in a different datacenter (and in adifferent logical network). The data message marked as “1” is a firstdata message from VM 505A destined for VM 505B in datacenter 520B (e.g.,having a destination IP address of VM 505B or of a virtual IP addressthat corresponds to VM 505B). The data message “1” is received at theMFE 512A which is assigned to provide gateway service processing for thelogical network to which VM 505A belongs. MFE 512A processes the datamessage (e.g., in accordance with process 400), performs gateway serviceprocessing on the data message, and provides the data message toprovider edge router 530A as data message “2”. The provider edge router530A in datacenter 520A provides the data message to the provider edgerouter 530B in datacenter 520B through network 555 as data message “3”.Provider edge router 530B of datacenter 520B provides the data messageto the MFE 512C executing on the same host computer as the destinationVM 505B which in turn provides the data message to the VM 505B as datamessage “4”.

FIG. 5B illustrates a return data message sent from VM505B back to VM505A. The data message sent from VM 505B retraces the forward path suchthat data message “5” is sent from VM 505B to the MFE 512C executing onthe same host as VM 505B. The MFE forwards the data message to theprovider edge router 530B in datacenter 520B which forwards the datamessage to provider edge router 530A in datacenter 520A through network555 as data message “6”. It should be noted that, depending on how thedatacenter 520B is configured, the data message may not be sent directlybetween the provider router 530B and the MFE 512C executing on the samehost as VM 505B. For instance, these data messages might be sent througha gateway or set of forwarding elements. Upon reaching the provider edgerouter 530A in the datacenter 520A, that provider edge router 530Aprovides the data message to the MFE 512A from which it received datamessage “2” as data message “7”. The MFE 512A in turn processes the datamessage and forwards it to VM 505A as data message “8”. As all flowsrelated to a particular logical network are processed by the same MFE,the gateway service processing may include stateful services thatrequire the MFE (or host) to maintain state information regarding thedata messages processed.

In some embodiments, when a different MFE is identified as the MFEassigned to provide gateway service processing for a logical network,the MFE that received the data message forwards the data message to thedifferent MFE through a tunnel. In various embodiments, the tunnels useVXLAN, GENEVE, STT, or other encapsulation protocols.

FIG. 6A-B data messages that receive gateway service processing at anMFE executing on a different host computer than the source of a firstdata message and destination of a return data message. FIGS. 6A-Binclude datacenters 620A and 620B connected by provider edge routers630A and 630B through external network 655. Each datacenter includesmultiple hosts 600 (not shown in datacenter 620B) that each execute aset of end machines (e.g., VMs, 605) and an MFE 612 (virtualizationsoftware executing the MFE not shown).

FIG. 6A illustrates a first data message being sent from VM 605A in afirst datacenter to VM 605E in a different datacenter (and in adifferent logical network). The data message marked as “1” is a firstdata message from VM 605A destined for VM 605E in datacenter 620B (e.g.,having a destination IP address of VM 605E or of a virtual IP addressthat corresponds to VM 605E). The data message “1” is received at theMFE 612A which is not assigned to provide gateway service processing forthe logical network to which VM 605A belongs. This MFE 612A performslogical processing and determines that MFE 612B is assigned to providegateway service processing for the logical network to which VM 605Abelongs, and thus forwards the data message to MFE 612B as data message“2” based on process 400 (e.g., by tunneling the data message to MFE612B). The MFE 612B processes the data message to provide gatewayservice processing in accordance with the configuration data for thelogical network, and then provides the data message to provider edgerouter 630A as data message “2”. The MFE 612B provides the data messageto the provider edge router 630A in datacenter 620A as data message “3”.The provider edge router 630A in datacenter 620A routes the data messageto the provider edge router 630B in datacenter 620B through network 655as data message “4”. Provider edge router 630B of datacenter 620B routesthe data message to the MFE 612C executing on the same host computer asthe destination VM 605E which in turn provides the data message to theVM 605E as data message “5”. It should be noted that, depending on howthe datacenter 620B is configured, the data message may not be sentdirectly between the provider router 630B and the MFE 612C executing onthe same host as VM 605E. For instance, these data messages might besent through a gateway or set of forwarding elements.

FIG. 6B illustrates a return data message sent from VM605E back to VM605A. The data message sent from VM 605E retraces the forward path suchthat data message “6” is sent from VM 605E to the MFE 612C executing onthe same host as VM 605E. The MFE forwards the data message to theprovider edge router 630B in datacenter 620B, which routes the datamessage to provider edge router 630A in datacenter 620A through network655 as data message “7”. The provider edge router 630A routes the datamessage to the MFE 612B from which it received data message “3” as datamessage “8”. The MFE 612B in turn performs ingress gateway serviceprocessing on the data message and forwards the data message to MFE 612Aas data message “9” (e.g., by encapsulating the data message). Finally,the MFE 612A forwards the data message to VM 605A as data message “10”.

As described above, in some embodiments, the logical networks include adistributed logical router that is implemented by each MFE (or each MFEhosting a machine connected to the logical network) and a centralized(service) logical router with gateway services configured that isimplemented at a particular host computer (e.g., by a particular MFE).Distributed and centralized logical routers are described in more detailin United States Patent Application No. 2016/0226700, which is herebyincorporated by reference. The MFEs, in some embodiments are alsoconfigured with an edge services gateway port that is used as adestination port for traffic destined to external networks and thatinitiates the performance of certain operations identified in process400 described above.

FIG. 7 illustrates a process 700 for processing a data message destinedfor an external network through a distributed logical router and an edgeservices gateway port. In some embodiments, the set of data messagesdescribed in relation to FIGS. 5 and 6 are the result of the set oflogical processing operations described in process 700. In someembodiments, the process 700 is performed by each MFE in the datapath ofthe data message (in some embodiments, the MFEs on the datapath onlyinclude a first hop MFE and the assigned MFE for the logical network,which could be the first hop MFE).

The process 700 begins by receiving (at 710) a data message destined foran external network. The data message is received, in some embodiments,at a distributed logical router port from a logical switch to which thesource of the data message sends the data message. If the MFE is afirst-hop MFE executing on the same host computer as the source of thedata message, the data message is received from its source (i.e., alogical network endpoint). If the MFE is an MFE that is assigned toprovide a gateway edge service for the logical network from which thedata message is sent, the data message is received from an MFE executingon a separate host machine on which the source of the data messageexecutes, in some embodiments. In this case, that MFE would haveperformed logical router processing to identify that the data messagerequires gateway services processing and tunneled the data message tothe assigned MFE for the logical network.

After receiving (at 710) the data message, the process 700 performs (at720) logical processing for a distributed logical router that includesidentifying an egress logical interface of the distributed router forthe data message. As described above, the MFEs are configured to senddata messages destined for external networks to an edge services gatewayport configured on each MFE. As part of the processing at the edgeservices gateway port, the MFE determines (at 730) whether the MFE isthe assigned MFE. In some embodiments, the determination is based on apolicy-based routing rule as described above that specifies that, forthe IP subnet to which the source IP address belongs, the MFE 15 (1) toprovide (at 740) gateway service processing (e.g., FW, etc.) and forward(at 750) the data message to the next hop which, in this case would bethe provider edge router or (2) to forward to the next hop MFE that isassigned to provide the gateway service processing.

If the process 700 determines (at 730) that the MFE is the assigned MFEit provides (at 740) gateway service processing (e.g., FW, etc.) andforwards (at 750) the data message to the next hop, which in this casewould be the provider edge router and the process 700 ends. If theprocess 700 determines (at 730) that the MFE is not the assigned MFE itforwards (at 750) the data message to the next hop, which in this casewould be the assigned MFE for the logical network and the process 700ends. In some embodiments, providing the data message to the next hopincludes encapsulating the data message. The encapsulation, in someembodiments, identifies a tunnel endpoint associated with the hostcomputer on which the MFE assigned for the logical network executes. TheMFE assigned to provide the gateway service processing would thenperform process 700 to provide the gateway services and forward the datamessage to the provider edge router on the path to the destination inthe external network.

Many of the above-described features and applications are implemented assoftware processes that are specified as a set of instructions recordedon a computer readable storage medium (also referred to as computerreadable medium). When these instructions are executed by one or moreprocessing unit(s) (e.g., one or more processors, cores of processors,or other processing units), they cause the processing unit(s) to performthe actions indicated in the instructions. Examples of computer readablemedia include, but are not limited to, CD-ROMs, flash drives, RAM chips,hard drives, EPROMs, etc. The computer readable media does not includecarrier waves and electronic signals passing wirelessly or over wiredconnections.

In this specification, the term “software” is meant to include firmwareresiding in read-only memory or applications stored in magnetic storage,which can be read into memory for processing by a processor. Also, insome embodiments, multiple software inventions can be implemented assub-parts of a larger program while remaining distinct softwareinventions. In some embodiments, multiple software inventions can alsobe implemented as separate programs. Finally, any combination ofseparate programs that together implement a software invention describedhere is within the scope of the invention. In some embodiments, thesoftware programs, when installed to operate on one or more electronicsystems, define one or more specific machine implementations thatexecute and perform the operations of the software programs.

FIG. 8 conceptually illustrates a computer system 800 with which someembodiments of the invention are implemented. The computer system 800can be used to implement any of the above-described hosts, controllers,and managers. As such, it can be used to execute any of the abovedescribed processes. This computer system includes various types ofnon-transitory machine readable media and interfaces for various othertypes of machine readable media. Computer system 800 includes a bus 805,processing unit(s) 810, a system memory 825, a read-only memory 830, apermanent storage device 835, input devices 840, and output devices 845.

The bus 805 collectively represents all system, peripheral, and chipsetbuses that communicatively connect the numerous internal devices of thecomputer system 800. For instance, the bus 805 communicatively connectsthe processing unit(s) 810 with the read-only memory 830, the systemmemory 825, and the permanent storage device 835.

From these various memory units, the processing unit(s) 810 retrieveinstructions to execute and data to process in order to execute theprocesses of the invention. The processing unit(s) may be a singleprocessor or a multi-core processor in different embodiments. Theread-only-memory (ROM) 830 stores static data and instructions that areneeded by the processing unit(s) 810 and other modules of the computersystem. The permanent storage device 835, on the other hand, is aread-and-write memory device. This device is a non-volatile memory unitthat stores instructions and data even when the computer system 800 isoff. Some embodiments of the invention use a mass-storage device (suchas a magnetic or optical disk and its corresponding disk drive) as thepermanent storage device 835.

Other embodiments use a removable storage device (such as a floppy disk,flash drive, etc.) as the permanent storage device. Like the permanentstorage device 835, the system memory 825 is a read-and-write memorydevice. However, unlike storage device 835, the system memory is avolatile read-and-write memory, such a random access memory. The systemmemory stores some of the instructions and data that the processor needsat runtime. In some embodiments, the invention's processes are stored inthe system memory 825, the permanent storage device 835, and/or theread-only memory 830. From these various memory units, the processingunit(s) 810 retrieve instructions to execute and data to process inorder to execute the processes of some embodiments.

The bus 805 also connects to the input and output devices 840 and 845.The input devices enable the user to communicate information and selectcommands to the computer system. The input devices 840 includealphanumeric keyboards and pointing devices (also called “cursor controldevices”). The output devices 845 display images generated by thecomputer system. The output devices include printers and displaydevices, such as cathode ray tubes (CRT) or liquid crystal displays(LCD). Some embodiments include devices such as a touchscreen thatfunction as both input and output devices.

Finally, as shown in FIG. 8, bus 805 also couples computer system 800 toa network 865 through a network adapter (not shown). In this manner, thecomputer can be a part of a network of computers (such as a local areanetwork (“LAN”), a wide area network (“WAN”), or an Intranet, or anetwork of networks, such as the Internet. Any or all components ofcomputer system 800 may be used in conjunction with the invention.

Some embodiments include electronic components, such as microprocessors,storage and memory that store computer program instructions in amachine-readable or computer-readable medium (alternatively referred toas computer-readable storage media, machine-readable media, ormachine-readable storage media). Some examples of such computer-readablemedia include RAM, ROM, read-only compact discs (CD-ROM), recordablecompact discs (CD-R), rewritable compact discs (CD-RW), read-onlydigital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a varietyof recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.),flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.),magnetic and/or solid state hard drives, read-only and recordableBlu-Ray® discs, ultra density optical discs, any other optical ormagnetic media, and floppy disks. The computer-readable media may storea computer program that is executable by at least one processing unitand includes sets of instructions for performing various operations.Examples of computer programs or computer code include machine code,such as is produced by a compiler, and files including higher-level codethat are executed by a computer, an electronic component, or amicroprocessor using an interpreter.

While the above discussion primarily refers to microprocessor ormulti-core processors that execute software, some embodiments areperformed by one or more integrated circuits, such as applicationspecific integrated circuits (ASICs) or field programmable gate arrays(FPGAs). In some embodiments, such integrated circuits executeinstructions that are stored on the circuit itself.

As used in this specification, the terms “computer”, “server”,“processor”, and “memory” all refer to electronic or other technologicaldevices. These terms exclude people or groups of people. For thepurposes of the specification, the terms display or displaying meansdisplaying on an electronic device. As used in this specification, theterms “computer readable medium,” “computer readable media,” and“machine readable medium” are entirely restricted to tangible, physicalobjects that store information in a form that is readable by a computer.These terms exclude any wireless signals, wired download signals, andany other ephemeral or transitory signals.

While the invention has been described with reference to numerousspecific details, one of ordinary skill in the art will recognize thatthe invention can be embodied in other specific forms without departingfrom the spirit of the invention. For instance, several figuresconceptually illustrate processes. The specific operations of theseprocesses may not be performed in the exact order shown and described.The specific operations may not be performed in one continuous series ofoperations, and different specific operations may be performed indifferent embodiments. Furthermore, the process could be implementedusing several sub-processes, or as part of a larger macro process. Thus,one of ordinary skill in the art would understand that the invention isnot to be limited by the foregoing illustrative details, but rather isto be defined by the appended claims.

1-20. (canceled)
 21. A method of performing gateway services for a firstlogical network spanning a first plurality of host computers in adatacenter, the method comprising: at the first host computer:receiving, from a machine that is part of the first logical network andexecutes on the first host computer, a data message destined for anexternal second network; identifying that the data message requiresgateway service processing before being sent to the external network;identifying a second host computer, which executes at least one othermachine that is part of the first logical network, as a host computerthat will provide the gateway service processing for the first logicalnetwork, forwarding that data message to the second host computer toperform the gateway service processing before forwarding the datamessage to the external network through a gateway of the first logicalnetwork.
 22. The method of claim 21, wherein the first host computerexecutes in a first datacenter and the external network is a network ofa second datacenter.
 23. The method of claim 22, wherein the first andsecond datacenters are connected through a direct connection that doesnot require network address translation.
 24. The method of claim 21,wherein a return data message from the external network to the firstlogical network is returned to the second host computer that providesthe gateway service processing for the first logical network.
 25. Themethod of claim 21, wherein the plurality of host computers implements aplurality of different logical networks, with at least two differenthost computers performing gateway services for at least two differentlogical networks and executing machines associated with the twodifferent logical networks.
 26. The method of claim 25, wherein at leastone host computer is assigned to provide gateway service processing formultiple logical networks.
 27. The method of claim 25, wherein thegateway service processing for the logical networks is assigned to thehost computers based on a load balancing operation.
 28. The method ofclaim 25, wherein a controller computer determines the assignment of thegateway service processing for the different logical networks.
 29. Themethod of claim 25, wherein first and second service edge nodesexecuting on first and second host computers respectively providegateway service processing for first and second logical networksrespectively.
 30. The method of claim 21, wherein the gateway serviceprocessing comprises stateful gateway service processing.
 31. Anon-transitory machine readable medium storing a program, which whenexecuted by at least one processing unit, performs gateway services fora first logical network spanning a first plurality of host computers ina datacenter, the program comprising sets of instructions for: at thefirst host computer: receiving, from a machine that is part of the firstlogical network and executes on the first host computer, a data messagedestined for an external second network; identifying that the datamessage requires gateway service processing before being sent to theexternal network; identifying a second host computer, which executes atleast one other machine that is part of the first logical network, as ahost computer that will provide the gateway service processing for thefirst logical network, forwarding that data message to the second hostcomputer to perform the gateway service processing before forwarding thedata message to the external network through a gateway of the firstlogical network.
 32. The non-transitory machine readable medium of claim31, wherein the first host computer executes in a first datacenter andthe external network is a network of a second datacenter.
 33. Thenon-transitory machine readable medium of claim 32, wherein the firstand second datacenters are connected through a direct connection thatdoes not require network address translation.
 34. The non-transitorymachine readable medium of claim 31, wherein a return data message fromthe external network to the first logical network is returned to thesecond host computer that provides the gateway service processing forthe first logical network.
 35. The non-transitory machine readablemedium of claim 31, wherein the plurality of host computers implements aplurality of different logical networks, with at least two differenthost computers performing gateway services for at least two differentlogical networks and executing machines associated with the twodifferent logical networks.
 36. The non-transitory machine readablemedium of claim 35, wherein at least one host computer is assigned toprovide gateway service processing for multiple logical networks. 37.The non-transitory machine readable medium of claim 35, wherein thegateway service processing for the logical networks is assigned to thehost computers based on a load balancing operation.
 38. Thenon-transitory machine readable medium of claim 35, wherein a controllercomputer determines the assignment of the gateway service processing forthe different logical networks.
 39. The non-transitory machine readablemedium of claim 35, wherein first and second service edge nodesexecuting on first and second host computers respectively providegateway service processing for first and second logical networksrespectively.
 40. The non-transitory machine readable medium of claim31, wherein the set of instructions for identifying the second hostcomputer comprises a set of instructions for using a policy-basedrouting entry to identify the second host computer.